Parsed Corpora for Linguistics

نویسندگان

  • Gertjan van Noord
  • Gosse Bouma
چکیده

Knowledge-based parsers are now accurate, fast and robust enough to be used to obtain syntactic annotations for very large corpora fully automatically. We argue that such parsed corpora are an interesting new resource for linguists. The argument is illustrated by means of a number of recent results which were established with the help of parsed corpora.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Syntax Statistics from Large Corpora of Written English

The field of linguistics has seen a growing interest in the statistics of everyday language. In studying how we acquire language and why some of its aspects are more difficult for us than others, it is critical to understand the linguistic environment to which we are exposed. However, gathering statistics over syntactic structures, even with a syntactically tagged corpus, can be difficult and t...

متن کامل

An Experiment On Learning Appropriate Selectional Restrictions From A Parsed Corpus

We present a methodology to extract Selectional Restrictions at a variable level of abstract ion from phrasally analyzed corpora. The method relays ia the use of a wide-coverage noun taxonomy and a statistical measure of the co-occurrence of linguistic items. Some experimental results about the performance of the method are provided.

متن کامل

What might a corpus of parsed spoken data tell us about language?

This paper summarises a methodological perspective towards corpus linguistics that is both unifying and critical. It emphasises that the processes involved in annotating corpora and carrying out research with corpora are fundamentally cyclic, i.e. involving both bottom-up and top-down processes. Knowledge is necessarily partial and refutable. This perspective unifies ‘corpus-driven’ and ‘theory...

متن کامل

Detecting Errors in Automatically-Parsed Dependency Relations

We outline different methods to detect errors in automatically-parsed dependency corpora, by comparing so-called dependency rules to their representation in the training data and flagging anomalous ones. By comparing each new rule to every relevant rule from training, we can identify parts of parse trees which are likely erroneous. Even the relatively simple methods of comparison we propose sho...

متن کامل

A New Machine Learning Algorithm for Neoposy: coining new Parts of Speech

Unsupervised Natural Language Learning, the use of machine learning algorithms to extract linguistic patterns from raw, un-annotated text, is a growing research subfield; for examples, see Proceedings of annual conferences of CoNLL: Computational Natural Language Learning, or the membership list of ACLSIGNLL, the Association for Computational Linguistics – Special Interest Group in Natural Lang...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009